Backdahl ef al. BMC Genomics 2014, 15:391 
http://www.biomedcentral.com/1471-2164/15/391 



(bmc 



Genomics 



RESEARCH ARTICLE Open Access 



Identification of candidate risk gene variations by 
whole-genonne sequence analysis of four rat strains 
connmonly used in inflannnnation research 

Liselotte BackdahT'^ , Diana Ekman', Maja Jagodic^, Tomas Oisson^ and Rikard Holmdahl^ 



Abstract 

Background: The DA rat strain is particularly susceptible to the induction of a number of chronic inflammatory 
diseases, such as models for rheumatoid arthritis and multiple sclerosis. Here we sequenced the genomes of two 
DA sub-strains and two disease resistant strains, E3 and PVG, previously used together with DA strains in genetically 
segregating crosses. 

Results: The data uncovers genomic variations, such as single nucleotide variations (SNVs) and copy number variations 
that underlie phenotypic differences between the strains. Comparisons of regional differences between the two DA 
sub-strains identified 8 genomic regions that discriminate between the strains that together cover 38 Mbp and harbor 
302 genes. We analyzed 10 fine-mapped quantitative trait loci and our data implicate strong candidates for genetic 
variations that mediate their effects. For example we could identify a single SNV candidate in a regulatory region of the 
gene 112 Ir, which has been associated to differential expression in both rats and human MS patients. In the APLEC 
complex we identified two SNVs in a highly conserved region, which could affect the regulation of all APLEC encoded 
genes and explain the polygenic differential expression seen in the complex. Furthermore, the non-synonymous SNV 
modifying aal53 of the Ncfl protein was confirmed as the sole causative factor. 

Conclusion: This complete map of genetic differences between the most commonly used rat strains in inflammation 
research constitutes an important reference in understanding how genetic variations contribute to the traits of 
importance for inflammatory diseases. 

Keywords: Rat genome. Whole-genome sequencing, QTL, Rheumatoid arthritis. Multiple sclerosis. Disease-associated 
SNVs 



Background 

The laboratory rat is a widely used model organism for 
studying both basic mechanisms of physiology, and dis- 
ease models of human diseases. Inbred rat strains and 
genetically segregating crosses are important tools in func- 
tionally proving disease mechanisms and represents a con- 
trolled setting where molecular interactions between 
different genetic factors as well as environmental factors 
can be tested [1]. 
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There are many common characteristics between 
chronic inflammatory diseases like Rheumatoid Arthritis 
(RA) [2], and Multiple Sclerosis (MS) [3]: these diseases 
are both consequences of dys-regulated immune re- 
sponses that has initially been triggered by a combination 
of genetic and environmental factors. Peripheral joints in 
RA and myelin sheets in the central nervous system in 
MS are the sites of persistent inflammation that leads to 
tissue destruction. The most highly associated genetic re- 
gion for the majority of these diseases is the Major Histo- 
compatibility Complex (MHC) and especially HLA II 
genes [4]. Despite recent progress in identifying loci that 
associate with complex diseases in human [5], the identifi- 
cation of the causative genetic polymorphisms, (both 
within and outside MHC), and the understanding of their 
pathophysiological roles, still represents a significant 
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challenge. The complexity of chronic inflammatory dis- 
eases lies not only in intricate interactions between genes 
and environment, but also in elaborate genetic phenom- 
ena such as gene-gene interactions, synergistic eff^ects and 
epigenetic mechanisms. 

Animal models provide a means to control environmen- 
tal factors and manipulate the genetic contribution. Al- 
though the disease causing polymorphisms are not likely 
to be identical with humans we expect the associated 
pathophysiologic pathways to be conserved. Using animal 
models could be a way to reveal their genetic complexity. 
In addition, the environmental exposure can be controlled 
and experimental manipulations are possible to perform. 
One other benefit of using animal models for genetic re- 
search is the possibility to selectively modify the genome 
e.g. producing congenic, knock-out or transgenic strains 
and linkage mapping. 

DA rats are remarkably susceptible to a number of auto- 
immune diseases; experimental arthritis induced with col- 
lagen [6] or hydrocarbons such as mineral oil [7] or 
pristane oil [8], experimental autoimmune encephalomy- 
elitis [9], experimental allergic neuritis [10], experimental 
allergic uveitis [11], and experimental autoimmune thy- 
roiditis [12]. Also among other inbred rat strains, the DA 
rat is the most prone to develop inflammation (Figure 1). 
This disease susceptibility is mediated by both MHC and 
non-MHC genes. It is the only strain susceptible to oil- 
induced arthritis and other arthritis inducers provoke very 
severe disease in DA compared to other susceptible strains. 
It has been shown that DA rats from different colonies dif- 
fer in disease susceptibility as well as contain distinct re- 
gions of genetic differences [13], therefore we decided to 
sequence 2 sub-strains of DA; DA/OlaHsd and DA/hanKini 
( here abbreviated as DA/O and DA/K). 

Genetic linkage studies using F2 animals from an 
inter-cross between DA and different disease resistant 
strains have identified about 50 quantitative trait loci 



(QTLs) associated with susceptibility to inflammatory 
diseases. These QTLs together cover approximately 20 
genomic regions in the DA rat that have been confirmed 
in more than one F2 inter-cross (Figure 2). The first F2 
cross that used DA as the inflammation susceptible 
strain was a DA/K x E3 cross in Pristane induced arth- 
ritis (PIA), that identified QTLs on chromosome 6, 4, 
12, and 14 [8]. Three more arthritis mapping studies and 
two in EAE utilizing DA/K x E3 F2 inter-crosses could 
verify the first four QTLs and identified new QTLs on 
chromosome 20 (MHC region), 10, 1, and 18 [15-19]. 
Dahlman et al. used a DA/K x ACI F2 cross in EAE to 
identify QTLs on chr 4, 7, 10, 12, 13, 15, 18, and X [20]. 
The QTLs on chr 4, 10, 12, 15 were later confirmed in 
DA/K X PVG.lAVl advanced intercross line (AIL) [21-26]. 
In 1998 two arthritis QTLs were identified in LEW.lAVlx 
PVG.lAVl on chr 4 and 10 that later were replicated in a 
DA/K X PVG.lAVl F2 [27,28]. In 2003 a LEW.lAVl x 
PVG.lAVl F2 cross identified two new QTLs on chromo- 
some 1 and 17 [29] that were replicated in two DA/K x 
PVG.lAVl AIL populations [30-33]. The PVG.lAVl is a 
congenic strain sharing the MHC region with DA, but is 
stUl resistant or develops extremely nuld disease in re- 
sponse to most inflammation inducers. The E3 strain is an 
independent inbred rat strain that is highly resistant to 
most inflammatory diseases but which tends to develop 
obesitas and has a coagulation defect [34] . Aiming to iden- 
tify genomic regions that differ between disease susceptible 
and resistant strains, DA/O, DA/K, E3/han and PVG/ 
lAVl.Kini were selected for sequencing. 

Since the main purpose of this sequencing effort has 
been to create a tool to be used in all genetic studies of 
the inflammatory rat we have also selected 10 quantitative 
trait loci (QTLs) that are ~ 1 Mb in size, to dissect and re- 
port all the variations in the interval. Main emphasis is 
put on non-synonymous SNVs (nsSNVs), synonymous 
SNVs (sSNVs), SNVs in UTRs, splice-site SNVs and SNVs 
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Figure 1 The genomic influence on disease susceptibility. The influence of genomic variations on the development of inflammatory disease 
by different genomes and MHC haplotypes. The illustration is an adaptation from [14]. 
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Figure 2 Arthritis and EAE regulating QTLs in the DA genome. Chromosome maps illustrating the genomic intervals of inflammatory disease 
regulating QTLs in the DA rat genome associated with mapping studies in DA/E3 or DA/PVG crosses and congenic strain. 



found in regions of high conservation between differ- 
ent species (PhastCons9way). The new SNVs that are 
identified will have to be analyzed in more detail and 
assessed experimentally. However, they represent an 
entry point for functional validation of candidate mo- 
lecular genetic mechanisms underlying established 
QTLs. 

The identified QTLs that segregate between DA and E3/ 
PVG signify regions of regulatory importance in the devel- 
opment of chronic inflammatory disease. Identifying the 
underlying factors within these disease-regulating regions 
would give important clues also for the diseases in human. 
The complete genomic sequence of the DA strain and the 
disease resistant strains E3 and PVG comprise invaluable 
resources in the study of the polygenic nature of complex 
diseases such as chronic inflammatory diseases. 



Results and discussion 

Sequencing and variation calling 

The four genomes DA/OlaHsd (DA/O), DA/hanKini 
(DA/K), E3/han (E3) and PVG/lAVLKini (PVG) were se- 
quenced in three separate runs using the SOLiD™ technol- 
ogy. Aiming to also identify structural variations we made 
three Mate-pair libraries with different insert sizes; 1 kb, 
2 kb, and 4 kb as well as one paired end library. The reads 
were then mapped to the BN (RGSC3.4) reference genome 
with BWA [35]. This resulted in median coverage of 18- 
22X (DA/O: 18X, DA/K 22X, E3 19X, PVG 19X) and en- 
abled us to cover >99% of the BN reference sequence with 
at least one read for all four genomes (Additional file 1). 

SNVs and short indels were identified using a strategy 
developed by Guryev et al [36]. To call a variant we re- 
quired a minimum coverage of three reads and at least 
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75% of the reads had to support the non-reference allele. 
This predicted about 5.3 million SNVs and 0.4 million 
short indels for all four genomes combined. 

By comparing our SNVs with the genotypes from the 
STAR consortium [37] we estimated the sensitivity to be 
about 96.8-97.8%. In addition, we sequenced 100 regions 
(300-500 bp in size) with capillary sequencing. Of the 
178 SNVs identified by capillary sequencing 171 SNVs 
were also identified by the SOLID sequencing, whereas 
two of the undetected SNVs were missed because less 
than 75% of the reads carried the non-reference allele. 
33 of the tested SNVs were in the coding sequence of a 
gene and all these variants was identified in the SOLID 
SNVs. Four of the undetected SNVs are in the hyper- 
polymorphic MHC region on chromosome 20. Further, 
no false positives were identified in these sequences. 

Variation analysis for the four genomes 
SNVs 

Each genome has about 3 million SNVs and 0.3-0.36 
million indels compared to the BN reference. Approxi- 
mately 70% of all SNVs are outside ref-seq annotated 
gene regions. About 1% of the SNVs are located in cod- 
ing regions, and 21% of the SNVs in coding regions are 
non-synonymous (Table 1). 

Pair-wise comparison identified that more than 98% of 
the high-quality (> 8X coverage) SNVs in DA/O are 
shared with the closely related DA/K. Analyzing the 
SNP distribution of the strains showed that about 14% 
of all SNVs identified are unique to DA/O and DA/K, 
compared to BN, E3 and PVG (Figure 3). Further, 29% 
of the identified SNVs are shared by all four genomes 
and differ only fi"om the BN reference sequence. A more 
detailed distribution of the SNVs can be found in 
Table 2. 

Stop codons 

Each strain has between 70-79 SNVs that results in a pre- 
dicted premature stop codon (Additional file 2). This affects 
a total of 131 genes, where about 60% are uncharacterized 
genes or olfactory and vomeronasal receptors. Further, in 
32 of these genes the stop codon was predicted in an alter- 
natively transcribed gene, and not affecting all annotated 

Table 1 SNVs and short indels (< 10 bp) compared to the 
BN reference 

Strain SNVs SNVs in SNVs in NS coding Short Indels 







gene regions 
(± kb) 


cds 


SNVs 


indels 


in cds 


DA/O 


3.02 M 


0.87 M 


30440 


6639 


0.34 M 


1206 


DA/K 


2.94 M 


0.86 M 


31130 


6706 


0.30 M 


1243 


E3 


3.09 M 


0.91 M 


31426 


6432 


0.36 M 


1294 


PVG 


2.99 M 


0.86 M 


31318 


6432 


0.30 M 


1241 



Cds = Coding sequence. 




Figure 3 Sequence similarities/differences between DA, E3 
and PVG. Similarity is given as the percentage of SNVs shared 
between combinations of strains out of all SNVs, after excluding 
1.01 M positions where at least one strain is heterozygous or has a 
coverage below 3. The number of SNVs kept for the analysis was 
4166021 . Total numbers of SNVs are in parenthesis. 

transcripts from the gene. 33 of the stop introducing SNVs 
were found in all 4 strains, whereas 56 are only present in 
one strain, with 23 unique for DA, 14 for PVG and 19 for 
E3. Furthermore, there is one stop gaining unique to the 
DA/K strain in the Grbl4 gene. 

Structural variants 

Next, we searched for structural variants between the 
strains. We detected 45 duplications in the range of 0.4- 
149 kbp, and 96 deletions between 1.8-134 kbp between 
DA/O and E3 (Additional file 3). Deletions were predicted 
to affect 96 genes in DA/O and 47 genes in E3. Fewer 
genes (42) were affected by deletions in DA/K compared 
to PVG and 36 genes in PVG compared to DA. The large 
number of gene deletions in DA compared to E3 is to a 
large extent due to deletions on chromosome 7 and 15, 
most of which are also deleted in PVG. 

Correlating SNVs in different genomic features with 
differential expression and alternative splicing 

Aiming to dissect the influence from SNVs occurring in 
different coding or transcript regulatory sequences such as 
splice-sites or UTRs, on differential expression and spli- 
cing, we compared the occurrence of SNVs within a cer- 
tain gene with the expression of that gene. Intragenic 
SNVs were compared to a set of differentially expressed or 
alternatively spliced genes from a study by Gillett et al. 
[38]. In this study differential expression and alternative 
splicing was analyzed in RNA from lymph node cells from 
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Table 2 A summary of identified high-quality SNVs (coverage > 8) between, DA/K and DA/O, DA/K and PVG and DA/0 
and E3 and the distribution of SNVs across genomic features 





DA/0*DA/K 


DA/0*E3 


DA/K*PVG 


DA/0-l-/K*E3=PVG 


BN*DA/0-l-/K-l-E3-l-PVG 


Stop gained 


1 


48 


37 


22 


10 


Stop lost 


0 


3 


2 


1 


2 


Non-synonymous 


108 


3688 


3374 


1628 


1437 


Synonymous 


224 


7398 


7546 


3193 


2770 


Essential splice site 


1 


41 


36 


23 


20 


splice site 


21 


1013 


973 


451 


401 


5'prime UTR 


21 


644 


692 


270 


231 


3' prime UTR 


113 


4761 


4481 


2029 


1860 


ntronic 


13949 


436591 


347893 


1 80040 


171296 


Uppstream 


1322 


49342 


40423 


19954 


18148 


Downstream 


1215 


41414 


34053 


16472 


14919 



DA/K and PVG rats 7 days after induction of EAE. They 
detected 13 genes with convincing evidence of differential 
spUcing between DA and PVG, of which three (Prexl, 
Itpr2 and Nabl) have predicted splice site variants that are 
unique to PVG. Further, 11 had coding SNVs. Naturally; it 
needs to be further investigated if these are the variants 
that lead to the altered splicing. 

To assess the correlation between differential expres- 
sion or splicing and SNVs on a larger scale, we looked 
for SNVs in coding or UTR regions as well as in splice 
sites in all genes with differential expression or splicing 
between DA/K and PVG in the Gillett et al. study and 
compared this with how often such SNVs occur in genes 
that were not differentially expressed or spliced. There 
was a significant enrichment of genes with SNVs in 
UTRs and coding sequences among the genes that dis- 
played differential expression compared to genes that 
did not (Figure 4a). Due to the multiple probe design of 
the Affymetrix arrays, the coding SNVs should not in- 
fluence the actual hybridization to the array and thus 
the difference should be reflecting the biology [39]. 
Similarly, there was an enrichment of genes with SNVs 
in splice sites, UTRs and coding regions in alternatively 
spliced genes compared to the genes where no alterna- 
tive splicing was detected (Figure 4b). This suggests that 
coding SNVs as well as SNVs in UTRs and in splice 
sites are more frequent in genes that display differen- 
tial expression or alternative splicing. However, there 
are still many genes without the variants that are dif- 
ferentially regulated. Hence, the absence of such vari- 
ants cannot be used to exclude a gene as a candidate. 
In addition, there are additional types of genomic vari- 
ations, such as larger structural variants like partial 
and whole gene duplications, which also need to be an- 
alyzed, since they can have a major impact on gene ex- 
pression and splicing [40]. 



Pairwise comparison of disease susceptible and 
resistant genomes 
Comparing two DA sub-strains 

There are more than 1 million of the identified SNVs be- 
tween DA and E3 and between DA and PVG that could 
contribute to the phenotypic difference. However, even be- 
tween the very similar DA/O and DA/K there is a differ- 
ence in sensitivity to arthritis induction [13]. Looking closer 
at the polymorphic regions between DA/O and DA/K we 
could show that the strains differ primarily in regions on 
chromosomes 1, 2, 3, 7 and 13, which is in complete agree- 
ment with the results by Rintisch et al. However, we were 
able to better define the intervals, and two of the regions 
could be divided in two distinct regions each and we also 
found a new region on chromosome 5 that differs between 
these sub-strains (Figure 5 and Table 3). In total, these eight 
regions include approximately 38 Mbp and contain 302 
genes of which 78 have nsSNVs or stop codon variants. 
Congenic data showed no effect on disease sensitivity from 
the region on chromosome 3 [13], hence the variants(s) 
causing the different arthritis susceptibility are more likely 
to be located on the other chromosomes, harboring in total 
122 genes (Table 3). 

To infer if the regions of differences are contaminations 
from earlier crosses or hyper polymorphic regions, we com- 
pared the SNVs in the regions with the STAR genotypes of 
a number of other strains [37]. SNV comparisons show that 
DA/K or DA/O displays an alternating pattern of identical 
sequence to ACI in each of the identified intervals (Table 3). 
The patchy pattern of the diverging regions between the 
two DA strains is most likely the effect of the strains being 
separated and inbred separately before the DA strain was 
completely homozygous for all chromosomes. 

The origin of the DA strain is often reported to be 
unknown but it has been documented to be derived 
from a stock from a Dr. T.T. Odell, Jr. at Oak Ridge 
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Figure 4 The association of SNVs in different gene regions and differential expression or splicing. Association of SNVs in different gene 
regions and differential expression or splicing, as reported in Gillett et al. [38]. The fractions of genes that have SNVs in i) coding region, ii) UTR or iii) splice 
sites were measured, a. Genes predicted to be differentially expressed between DA/K and PVG compared to genes with no differential expression in this 
condition. Three different levels of fold change were used to categorize differentially expressed genes, resulting in 88 genes with fold change (FC) > 2, 214 
genes with FC > 1.7, 520 genes with FC > 1.5 and 14140 genes with no differential expression, b. Alternatively spliced genes compared with genes with 
no evidence of alternative splicing in this condition. Three significance levels were used to categorize significant alternative splicing in genes, resulting in 
123 genes in the highest significance category p < 0.00001, 278 (p < 0.01), 613 (p < 0.1) and 13208 genes that were not considered alternatively spliced. 



National Laboratory, Tennessee. In 1957 Dr. Odell de- 
scribes an agglutination test with an experimental set 
up using animals from a cross between two inbred 
lines that fits very well with what we now know about 
the DA strain [41]. The first strain for the initial inter- 
cross is an Irish coated strain that had been back- 
crossed for 47 generations that express the D antigen 
on their erythrocytes. This strain was most likely ACI 
(see Table 3). The second inbred strain in the inter- 
cross expressed the C antigen instead and had only 
been backcrossed for nine generations. Some sugges- 
tions of the C antigen strain are early generations of 
the E3, BUF or LEW strains, which carry this allele. 
The Fl progeny was then backcrossed to either the D 
or the C strain. The D/D backcross population is most 
likely the founder of the D-antigen expressing Agouti 
coat color (i.e. DA) strain. Inbreeding was at inter-cross 
generation 19 in about 1965. However a subset of the 
DA colony was sent from Wistar to Oxford already in 
1963, where it was further inbred. Thus, the two DA 
sub-strains were separated at generation <19 before the 
chromosomes had become homozygous which explains 



the dissimilar regions. It is also possible that more DA 
sub-strains exist in laboratories throughout the world. 

Functional comparison - genomic variance within 
inflammation associated QTLs 

Dissecting the genomic influence on inflammation re- 
quests a lot of phenotypic data. Both linkage studies and 
congenic mapping have identified '-20 genomic regions in 
the DA rat associated to inflammatory disease (Figure 2). 
The variation analyses of smaller QTLs, approximately 1 
Mb in size, are described in more detail below and a sche- 
matic overview of all inflammation QTLs identified in 
comparisons of the three genomes irrespective of interval 
size are described in Table 4. 

Chromosome 1 

Chromosome 1 harbors two regions of QTLs for both 
EAE and PIA, [16,18,29,31,33,42], The fine-mapping of 
the two QTLs down to a shorter genomic interval has 
been done in EAE. 

One QTL region Eea29/Pia8 is positioned close to the 
centromere and this region regulates both EAE and PIA. 
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Figure 5 DA/0 

SNV density in 8 



Beyeen et al. fine-mapped this QTL in EAE to less that 3 
Mb using an AIL and they also validated the regulatory 
effect using a 25 Mb congenic strain [31]. The refined 
Eae29/Pia8 contains 18 genes. Beyeen et al. could show 
differential gene expression in the genes IL22RA2 and 
IFNGRl and association to IL22RA2 could be shown in 
a cohort of MS patients. Therefore these two genes are 
strong candidates as regulators of inflammatory disease 
and thus it is important to study all variants in the re- 
gion for regulatory effects and if they coincide with con- 
served regions. Comparing SNVs between the strains 
DA and PVG in the shorter Eae29 region displayed very 
little variation between DA and PVG. Surprisingly, only 
16 SNVs were identified between 14.5-15.1 Mb. IL22ra2 
had one SNV in the second intron and Ifngrl had one 
SNV in the first intron. To analyze if the SNV is in a 
conserved segment and thus may be coinciding with a 
regulatory element, the conservations score for the SNVs 



was assessed. The SNV is considered to be conserved if the 
score is higher than 0.1, (http://hgdownload.soe.ucsc.edu/ 
goldenPath/rn4/phastCons9way/). Three of the SNVs in 
Eae29 coincide with highly conserved regions (Figure 6a). 
Further comparison of these SNVs to data from eight in- 
bred rat strains [56] identified two SNVs in conserved re- 
gions to be unique for the DA stain and the SNV in Il22ra2 
is unique for DA and one of its ancestral genomes ACI 
(Additional file 4). 

The second QTL on chromosome 1 has been fine 
mapped in both EAE and PIA from an interval between 
176-218 Mb and was resolved into two distinct QTLs 
EaeSO and EaeSl. EaeSl is positioned to a region of less 
than 1 Mb around 185 Mb and Eae30 to 1 Mb close to 
128 Mb [33]. EaeSl was fine-mapped in both EAE and 
PIA to five candidate genes and the region showed evi- 
dence of association to human inflammatory disease to a 
SNV in the IL21R. No coding polymorphisms could be 



Table 3 DA/O and DA/K differentiating regions 



Chr. 


Start 


End 


DA/O 
similar to: 


DA/K 
similar to: 


SNVs in 
region 


Genes in region 


Genes with 
NS SNVs 


Polymorphic genes 


1 


225.91 


229.22 


ACI 


F344, BUF 


3871 


21 


2 


Kanki, ENSRNOG00000023843. 


2 


201.5 


207.21 


LE 


ACI 


9084 


74 


14 


Dennd2d, Prokl, Slcl6a4, Gstm7, Clccl, Prmt6. 




218.22 


219.01 


ACI, E3 


F344, BN 


1482 


6 


3 


Arhgap29, Abca4, Dnttip2. 


3 


34.7 


52.32 


ACI 


» (F344) 


24496 


80 


20 


Slc39all, Galnt5, Upp2, L:y75, Dpp4, Grb14 (Stop). 




73.21 


78.08 


ACI, F344, BUF 


* (WKY) 


6287 


100 


33 


C1qtnf4, Mybpc3, 01726 (Stop), Acp2, Pacsin, Dgkz. 












Olfactory: 


56 


26 














Other: 


44 


7 




5 


7.1 


8.06 






1123 


1 


1 


Depdc2 (Prex2) 


7 


45.76 


48.5 


SHR, F344, WKY 


ACI 


4352 


13 


3 


Ptprq, ENSRNOG00000030316 (OtogI). 


13 


33.3 


35.6 


ACI, SHR 


PVG, E3, BUF 


5250 


7 


2 


ENSRNOG00000030901, DpplO. 
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Table 4 A summary of all coding SNVs in EAE and arthritis regulating QTLs (Continued) 
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Eae29 
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Figure 6 Snapshots from the UCSC genomic web-browser describing the genomic locations of the identified SNVs within 4 QTLs or 
specific genes in QTLs (a-d). The SNVs are idepictecJ as reid or blue bars in tlie top of the illustration. The number next to the bar corresponids 
to the SNPs incjividual number in the series of all SNPs in the QTL. Blue bars in(dicate SNVs with conservation score higher than 0.1, in red are 
SNVs less that 0.1 . The mid segment of the snapshot contains the genes in the interval and below is the degree of inter-species conservation 
illustrated as blue and green bars. 



identified in the IL21R gene in the rat, but the IL4RA 
gene had a spUce site SNV. Four SNVs are unique to 
DA (Additional file 4). In addition, two other genes in 
the region, Gtf3c and LOC361646, displayed numerous 
nsSNVs. Further, looking at SNVs in conserved regions 
identified 5 potentially regulating SNVs (Figure 6b). The 
IL4RA gene also contains two sSNVs, whereas the IL21R 
gene contains one potentially regulating SNV in an intron 
(conservation score 0.45). 



In the Nohra et al. study [33], an additional 1.1 Mb EAE 
specific QTL EaeSO close to the RGMA gene was identi- 
fied in a GIG AIL. Interestingly, there was suggestive asso- 
ciation to the human RGMA gene also in an MS cohort. 
There are 48 polymorphisms between DA and PVG in 
RGMA. Ten of the variants are of potential interests; three 
SNVs in conserved regions and one synonymous SNV in 
the last exon. Four SNVs in the region are unique for DA/ 
ACI (Additional file 4). 
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Chromosome 4 

Rat chromosome 4 is one of the most QTL dense re- 
gions in the DA genome. Three regions have been Unked 
to EAE and arthritis. The QTL PiaS/Eae24-27 have also 
been hnked to experimental diabetes [57] and uveitis 
[58]. There is also evidence that the chromosome has re- 
gions of epistatic genetic interactions [43] . 

The original Pia5/Eae24-27 locus is more than 20 Mb 
and covers 250 to 500 genes depending on the study and 
disease model [16,43]. Among the genes in this region 
are TCRBV, Ig Kappa genes and IL23R, with potential 
profound importance in immune regulation. In 2010 
Marta et al. used a combination of congenic lines as well 
as an advanced intercross line to dissect disease regula- 
tion in this region. They showed that the effect on dis- 
ease regulation was larger than the effect from any of 
the smaller congenic strains, however one small (1.25 
Mb) congenic strain (R23) mediates a major part of the 
regulatory effect from this region. No coding variations 
could be identified between DA and PVG in this region, 
which indicates that the disease regulating variation is 
regulatory. Indeed, this region is very conserved between 
species and of the 1810 SNVs in the interval 188 coin- 
cided with conserved genomic intervals. Not surpris- 
ingly, many of the genes in the region have important 
function in embryonic development. 

Next, the arthritis QTL Oia2/Pia7 at 159.46-160.0 
Mb is known to be a strong regulator of adjuvant- 
induced arthritis such as OIA or PIA [42,45,46]. Fur- 
ther, this arthritis QTL coincides with the EAE QTLs 
EAae20-21 [24]. The region harbors the APLEC com- 
plex, which encodes a set of C-type lectin receptor 
genes expressed on antigen presenting cells. Most 
of the APLEC genes have been shown to be differen- 
tially expressed after stimulation with a number of 
immune-triggering stimulants [45,59]. There are a 
number of coding polymorphisms in the APLEC com- 
plex encoded genes. One generates a premature stop- 
codon in Clec4b2 in the DA genome and is certainly a 
plausible candidate to mediate the disease phenotype. 
However, four other APLEC encoded genes have non- 
synonymous SNVs between DA and E3/PVG. Further- 
more, a total of 832 SNVs could be identified in the 
550 kb consensus region and 33 polymorphisms lie 
within conserved regions in or close to genes. Neither 
of the SNVs in the APLEC complex are unique to DA 
but are shared between at least five other inbred ge- 
nomes (Additional file 4) [56]. One region at 159.955 
kb has the maximum conservation score = 1, accord- 
ing to the phastCons9way, and could be a regional en- 
hancer for the APLEC genes in the region. This region 
harbors two SNVs. This region needs to be analyzed 
further to conclusively determine weather this is an 
enhancer or not. 



Chromosome W 

Chromosome 10 harbors at least four susceptibility genes 
for experimental inflammatory disease. Proximal to the 
centromere is the VrallCiita gene locus [47,48], which is 
part of a haplotype identified to regulate levels of MHC 
class II expression, and which was also reported to be as- 
sociated with chronic inflammatory disease [48] . The vari- 
ant underlying the effect has however not been identified. 
Ciita contains no nsSNVs between DA and PVG. How- 
ever, there are three synonymous coding SNVs that may 
influence the expression levels of which two are unique 
for E3/PVG (Additional file 4). In addition, there are a 
number of polymorphisms in conserved regions with po- 
tentially regulatory effect on the gene. The 19 kb upstream 
promoter of Ciita harbors two very conserved regions 
with 10 SNVs of which one is unique to E3/PVG. 

The mid-part of chromosome 10 has been associated 
with both EAE and arthritis Eael8/Cia5/Oia3 [25,50,60]. 
Eael8 QTL was fine-mapped using two AILs, G7 and GIO, 
which divided the QTL in two QTLs, EaelSa and EaelSb 
[25]. EAElSa was fine-mapped to 1.1 Mb harboring 13 
genes at 58.2-59.3 Mb [22]. Two nsSNVs in Fam64a and 
ENSRNOG00000037371, as well as two SNVs in splice- 
sites of RGD1304728 and Smtln2 were identified in the 
EaelSa locus, in addition to several potentially regulating 
SNVs. 

EaelSb was fine-mapped to a 0.88 Mb region between 
70.2-71.0 Mb in a GIO AIL [49]. Two genes in this region 
Ccl2 and Cclll showed differential expression. Interest- 
ingly, differential expression of Ccl2 has also been ob- 
served between MS patients and controls. However in a 
functional characterization study of the EaelSb congenic 
strain, Cclll was conclusively identified to be regulating 
many different disease pathways in this strain [30]. Cclll 
has six SNVs in the 3 'UTR of which four are in conserved 
sequences; the gene also has one conserved intronic SNV 
and one synonymous SNV (Figure 6c). 

In experimental arthritis a QTL has been identified in 
both DA/PVG and DA/F344 crosses on chromosome 10 
[14,50,60]. This arthritis specific QTL was fine-mapped 
in pristine-induced arthritis using an AIL GIO [14]. Also 
for this QTL the locus was separated in to two regions. 
Interestingly the second 1.5 Mb region coincided with a 
region with Cd300 like genes, which contain distinct re- 
gions of very high variation between DA and the other 
strains (Figure 6d). The aggregation of SNVs in the 
vicinity of CdSOO genes may be an indication of struc- 
tural variations in the region and needs to be further 
studied, however these genes could have a significant 
influence on the regulation mediated by this region. 
Interestingly this locus was not mapped in any of the 
DAxE3 F2 crosses, which suggests that the regulating 
polymorphism(s) could be unique to the PVG strain. In 
the other region Pia30, 114 SNVs were identified that 
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segregated E3/DA and PVG alleles, of which most are 
intronic or intergenic SNVs in or near the Rpl38 and 
Ttyh2 genes. 

Chromosome 12 

The first arthritis regulating QTL to be cloned and posi- 
tioned to one polymorphism is the Pia4, which was identi- 
fied to be regulating chronic inflammation by a non- 
synonymous SNV in the Ncfl gene, altering the amino acid 
at position 153 [52]. It was originally mapped using both 
PIA {Pia4) [8] and EAE models (EaeS/Eaell) [18,23]. The 
gene was identified using a congenic strain with E3 alleles 
between 23.476-23.730 Mb and harbors three genes. Here, 
a total of 377 SNVs were detected between DA and PVG/ 
E3 in the Pia4 region. Five of them are coding variants, of 
which one is non-synonymous and one is annotated to be 
in a splice site of Gtf2i. Moreover, 7 SNVs are located in 
UTRs of which 5 are in the 3'UTR of the Ncfl gene, but 
they have low conservation scores. Five SNVs are in poten- 
tially conserved regions, but only one of them, the nsSNV, 
is in Ncfl. Hence, the genome sequences support the previ- 
ous findings that the inflammation regulatory effect from 
this congenic is mediated by the Ncfl nsSNV [61]. 

Conclusions 

DA rats are particularly susceptible to the induction of a 
number of chronic inflammatory diseases. There is a 
modest difference in arthritis susceptibility between the 
two strains where DA/O is slightly more susceptible. 
Here we identified eight polymorphic regions between 
two substrains of DA; DA/K and DA/O together cover- 
ing 38 Mb of genomic sequence. Of the variable regions, 
23 Mb are located on chromosome 3 and by using a 
DA/0.chr3DA/K congenic strain these regions were ex- 
cluded as important in arthritis regulation [13], and thus 
the regulatory effect should come from the remaining 
15 Mb variable regions between DA/O and DA/K. 

According to the rat genome database more that 50 in- 
flammatory disease related QTLs have been identified in 
the DA rat. Here we have used the next-generation sequen- 
cing of the DA genome and inflammatory disease resistant 
E3 and PVG to investigate 10 fine-mapped QTLs and 
closely catalogue the complete set of variations in these re- 
gions. This study has identified a number of interesting 
non-synonymous or regulatory variants within these QTLs. 

For the already positionally cloned gene, Ncfl, in the 
Pia4 QTL on chromosome 12, no additional regulatory 
SNVs were identified in the proximal region of Ncfl that 
are likely to contribute to the regulation of inflamma- 
tion in the region. Hence, this further enhances the im- 
portance of the previously detected nsSNV (coding for 
aal53 of the protein) in the regulation of experimental 
inflammatory disease. 



Two of the inflammatory-disease regulating regions 
analyzed here; the Vml/Ciita and £<3:e25/R23 loci are 
completely devoid of coding SNVs, but do contain sev- 
eral SNVs in conserved regions, indicating regions of 
regulatory function. 

Four of the QTLs {Eae29, EaeSO, EaeSl and EaelSb) 
have strong candidate genes that have been identified to 
be differentially expressed between DA and PVG as well 
as associated to chronic inflammatory disease in human 
cohorts. These gene-regions were analyzed for regulatory 
polymorphisms. Eae29 has three very good regulatory 
SNV candidates that should be assessed for disease regula- 
tion in the vicinity of IL22ra2. The expression of IL22ra2 
has been demonstrated to differ between the strains impli- 
cating difference in regulatory regions [31]. IL22RA2 is 
now one of the well-established MS risk genes [62]. EaeSO 
has one non-synonymous SNV in a topoisomerase like 
gene, however this gene may not be the disease-regulating 
gene in the interval. RGMA, which is the top candidate 
gene in the region, contains 4 conserved regions with pos- 
sible regulatory SNVs. The top candidate gene for the 
EaeSl locus is Il21r. This gene has only one identified 
SNV with potential regulatory function in the first intron 
of the gene. Eae31 also contains a splice site SNV in the 
ll4ra gene plus two potential regulating SNVs that also 
should be tested for EAE regulation. The QTL EaelSb 
harbors the Cclll gene that have shown strong immune- 
regulatory effects in inflammatory disease [30]. Cclll con- 
tains five highly conserved SNVs and particularly four 
SNVs in the 3'UTR are strong candidates to be regulators 
of differential expression. 

The APLEC locus on chromosome 4 harbors five nsSNVs 
in four genes. This region contains very few conserved re- 
gions, however a new impending regulatory region 50 kb 
upstream of the last gene (Mincle) in the complex was 
identified that contains two SNVs in the middle of the con- 
served region. This region has to be analyzed further to 
fully comprehend if and how it could regulate all genes in 
the region. 

A hyper-variable region in PiaSO on chromosome 10 
was also identified. This region contains four SNV dense 
regions of about 10 kb in size each. These variable re- 
gions occur in the extended 3'UTR of CdSOO genes, 
CdSOOh and CdSOOf in particular. This enrichment of 
SNVs in a short interval may perhaps be explained by 
sequence duplications/ divergence. 

By combining information for the most interesting SNVs 
in the herein analyzed QTLs with SNV information from 
an eight genome sequencing study of inbred rats [56], we 
identified a number of SNVs unique for inflammation sus- 
ceptible DA and the inflammatory disease resistant PVG 
and E3 strain. Three QTLs had DA or DA/ACI unique 
SNVs and one had E3/PVG unique SNVs. Out of the eight 
QTLs that were used in this comparison only four had 
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unique SNVs. This is somewhat expected since one of the 
few cloned chronic inflammatory disease regulating SNV, 
in the Ncfl gene, also is present in the inbred strains BUF, 
F344 and WKY. The strain BUF is susceptible to both arth- 
ritis and EAE whereas both F344 and WKY are resistant to 
experimental arthritis. The disease regulation mediated by 
this SNV is however very strong and has been confirmed 
to be disease regulating also in mice. This demonstrates 
that additional protective genes in the genome may mask 
even very strong disease enhancing SNV effects. 

In a genotype to phenotype perspective, the combined 
effect from the QTLs has profound functional implica- 
tions. In short, both the adaptive and innate immunity is 
affected; The Ncfl gene is involved in the first defense 
against infections and triggering the immune system. 
Both the CD300 and the APLEC encoded genes are im- 
portant pattern recognition receptors involved in antigen 
uptake and Ciita is important in regulating antigen pres- 
entation. The chemokine CcUl and the cytokine receptors 
I121r and I122ra2 are important in modulating the immune 
responses leading to inflammation. I121r and I122ra2 are 
particularly important regulators of activated T cells. In 
conclusion the 10 QTLs represent regions of important 
disease regulation and the combined effect explains a sub- 
stantial part of the pro-inflammatory phenotype that is 
characteristic of the DA strain. 

The full genomic sequence of DA/O, DA/K, E3 and 
PVG constitutes an invaluable resource in the understand- 
ing of how polymorphisms in the DA genome contribute 
to disease susceptibility and this genetic blueprint of an in- 
flammation permissive rat strain will also be important in 
the understanding of the human etiologies for chronic in- 
flammatory disease. 

Methods 

Next-generation sequencing 

Liver samples from 1 female rat per strain was used for all 
runs. The sequences were generated in 3 separate runs. 
The first run (august 2009) was achieved using a SOLiD'"3 
sequencer with 1 full slide per sample. The Mate-pair li- 
brary protocol was used to generate 2 x 50 bp reads, with 
2-3 kb insert size. A Hydro-shear was used for fragmenta- 
tion. The libraries were produced after 11 cycles of amplifi- 
cation. The second run (September 2010) was performed 
using a SOLiD"'4 and generating 2 Mate-pair libraries per 
genome, one with 1 kb insert size and one with 4-4.5 kb 
insert sizes. 54 of a slide was used for each library and the li- 
braries were amplified 10-15 times. The DNA samples 
were fragmented using a Covaris S2. The last sequencing 
run using the Fragment library protocol was done on a 
SOLiD™5500 (June 2011). 3 lanes of 75 X 35 bp reads were 
run per sample. Fragmentation to 150-200 bp was achieved 
by a Covaris S2 with 6 cycles of amplification. All libraries 



in the study were quality controlled on an Agilent Bioanaly- 
ser High sensitivity Chip + qPCR. 

Mapping 

The reads were aligned to the BN reference assembly 
(Rnor3.4) with the BWA (vO.5.9) aligner (-c -1 25 -k 2 -n 
10). Mapped reads from all libraries were merged before 
SNV calling. The yield of sequence mapping to the genome 
was 48.9 Gb for DA/O, 50.8 Gb for DA/K, 59.9 Gb for E3/ 
han and 51.8 Gb for PVG/lAVl.Kini (Additional file 1). 

SNV/lndel calling 

SNV and short indel calling was done with a strategy de- 
veloped by Guryev [36] based on Samtools pileup [63]. 
Duplicate reads (starting at the same position) were dis- 
carded. Next, each SNV should be supported by at least 
three calls with a quality > 10. In addition, we used only 
SNVs where at least 75% of the reads support the non- 
reference allele (40% for indels). Finally, SNVs that dif- 
fered between the reference and the resequencing of the 
original reference rat (Eve), were removed since these 
are likely to be errors in the reference. This resulted in a 
total of 5.2 M SNVs and 0.66 M short indels (< 10 bp), 
of which about 57% were deletions and 43% insertions. 
A "high-quality SNV set" was compiled of SNVs with at 
least 8 X coverage. About 97% of the genome was covered 
by at least 3 calls and 85% by 8 or more. 

The functional annotations of all SNVs were predicted 
with the Ensembl Variant Effect Predictor [64]. 

These strains have been inbred for many generations and 
should be homozygous at nearly all positions. Therefore we 
excluded approximately 0.9 M positions per strain where 
25-75% of the reads supported the variant allele. Almost 
10% of these variants are detected in regions with high 
coverage (>30X), compared to <1% for homozygous SNVs, 
and may be due to duplications, as suggested previously 
[40]. Further, we found that 7% of the heterozygous SNVs 
in DA/O (coverage 9-25X) were called as homozygous 
SNVs in DA/K, whereas 1% were called as reference. 
Hence, some heterozygous SNV calls are likely true SNVs 
which are missed due to low coverage and errors in se- 
quencing/ alignment. 

To estimate the SNV calling sensitivity we compared 
our SNVs with a set of 20284 SNVs typed by the STAR 
consortium [37]. In this set, 11241, 11037 and 11253 
SNVs differed between BN and DA, E3 and PVG, re- 
spectively, of which, we detected between 96.8-97.8% in 
all strains. We also sequenced a number of SNV dense 
regions with conventional Sanger sequencing. Out of the 
178 SNVs detected by Sanger sequencing, 171 could be 
detected by the SOLID sequencing, whereas two were 
predicted by less than 75% of the reads and thus were 
false negatives. No false positives were found with 
SOLID in these regions. 
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The sensitivity of indel calling could not be estimated 
using STAR genotypes and the Sanger sequenced re- 
gions did not contain any indels, however it is likely that 
the sensitivity is substantially lower. Indeed, we found 
that only 66% of the indels called in DA/O were con- 
firmed in DA/K. 

CNV calling 

Copy number variation (CNVs) due either to deletions 
or duplications of regions, were predicted with DWAC- 
seq version 0.56 to identify CNVs. This program counts 
the reads in a window of dynamic size and compares 
these counts for two strains. We counted only uniquely 
mapped reads and used a ratio below 0.3 for deletions 
and above 1.6 for duplications. To better distinguish 
CNVs from repetitive regions, we only allowed regions 
with one strain having a mean coverage around the 
mean genome coverage (±0.5* mean coverage). Further, 
regions were only predicted as deletions if the number 
of bases lacking coverage differed by at least 10% be- 
tween the strains. 

Ethics statement 

All experiments in this study were approved and performed 
in accordance with the guidelines from the Swedish Na- 
tional Board for Laboratory Animals and the European 
Community Council Directive (86/609/EEC) under the eth- 
ical permit N332/06entitled 'Genetic regulation, pathogen- 
esis and therapy of EAE, an animal model for multiple 
sclerosis', which was approved by the North Stockholm 
Animal Ethics Committee (Stockholm snorra djurforsokse- 
tiska namnd). Rats were tested according to a health- 
monitoring program at the National Veterinary Institute 
(Statens Veterinarmedicinska Anstalt, SVA) in Uppsala, 
Sweden. 

QTL analysis 

QTLs smaller than 1 Mb in interval were selected for 
finer SNV analysis. SNVs were visualized using the add- 
custom track function in the UCSC web browser. The 
conservation scores (phastCons9way) where downloaded 
from the UCSC. http://hgdownload.soe.ucsc.edu/goldenPath/ 
rn4/phastCons9way/. In order to retrieve information on 
unique SNVs in the smallest QTLs, conserved SNVs or 
nsSNVs in the QTLs were compared to eight inbred rat 
strains [56] in a 13 genome comparison. 

Accession numbers 

The sequence data for all four genomes will be available 
from the European Bioinformatics Institute short read 
archive under the accession number: ERP0004908. All 
variants identified in this study will be available from the 
rat genome database (http://rgd.mcw.edu). 
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Additional file 1: Yield and coverage for sequence libraries. Table 
describing the different libraries that were created, and the yield and 
genome coverage from sequencing. 

Additional file 2: Stop-gained or stop-lost nsSNVs. Table with nsSNVs 
that generates stop-gained or stop-lost. 

Additional file 3: Insertions and deletions. Table describing insertions 
and deletions identified in the sequencing study. 

Additional file 4: A 16 rat strain SNV analysis within eight 
inflammation associated QTLs. Table including SNV information for 9 
rat strains [56] within eight of the smallest QTLs discussed in the Results 
section. The SNV that are included in the table are all identified nsSNVs, 
sSNVs splice-site variants and conserved SNVs. 
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